منابع مشابه
Bottom-k document retrieval
We consider the problem of retrieving the k documents from a collection of strings where a given pattern P appears least often. This has potential applications in data mining, bioinformatics, security, and big data. We show that adapting the classical linear-space solutions for this problem is trivial, but the compressed-space solutions are not easy to extend. We design a new solution for this ...
متن کاملOptimal Top-k Document Retrieval
Let D be a collection of D documents, which are strings over an alphabet of size σ, of total length n. We describe a data structure that uses linear space and and reports k most relevant documents that contain a query pattern P , which is a string of length p, in time O(p/ log σ n+k), which is optimal in the RAM model in the general case where lgD = Θ(logn), and involves a novel RAM-optimal suf...
متن کاملTop-K Color Queries for Document Retrieval
In this paper we describe a new efficient (in fact optimal) data structure for the top-K color problem. Each element of an array A is assigned a color c with priority p(c). For a query range [a, b] and a value K, we have to report K colors with the highest priorities among all colors that occur in A[a..b], sorted in reverse order by their priorities. We show that such queries can be answered in...
متن کاملTop-k document retrieval in optimal space
We present an index for top-k most frequent document retrieval whose space is |CSA|+o(n)+D log n D+O(D) bits, and its query time is O(log k log 2+ n) per reported document, where D is the number of documents, n is the sum of lengths of the documents, and |CSA| is the space of the compressed suffix array for the documents. This improves over previous results for this problem, whose space complex...
متن کاملTime-Optimal Top-k Document Retrieval
Let D be a collection of D documents, which are strings over an alphabet of size σ, of total length n. We describe a data structure that uses linear space and and reports k most relevant documents that contain a query pattern P , which is a string of length p packed in p/ logσ n words, in time O(p/ logσ n+k). This is optimal in the RAM model in the general case where logD = Θ(log n), and involv...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Discrete Algorithms
سال: 2015
ISSN: 1570-8667
DOI: 10.1016/j.jda.2014.12.009